ASM Implementation of Blasting the Bounds

By Alexander Toresson

Yesterday I read Adok's article in Hugi 27 about how to use bigger integers than supported in a compiler. IMO his implementation was very far from the optimal. First, I'd make it use the full bytes, rather than the way he's doing it, storing a decimal char in each byte. His way is a waste of space and make's the implementation quite slow. There's a huge difference using the string "18446744073709551615" (20 digits/bytes) than the bytes 0xFFFFFFFFFFFFFFFF (8 bytes). Then we can process the number in dwords (on a 32-bit machine) or qwords (on a 64-bit machine). That means that either 1 or 2 passes is made on the number, instead of 20. Second, I'd limit the number to a fixed number of bits, for example a 128-bit number. That means an easier implementation and faster code. You must know how many bits you're going to use, if you're not programming a scientific calculator. Third, I'd implement it in ASM. That means a more elegant implementation and faster code.

IMO, 64-bit numbers are enough for most things done, so here goes:

Implementation of adding two 64-bit numbers, a and b, to each other:

eax contains low dword of a, ebx the high dword
ecx contains low dword of b, edx the high dword

ADD eax, ecx
ADC ebx, edx

eax contains low dword of output, ebx the high dword

That's it. Similarly, subtracting two numbers:
(same inputs and outputs)

SUB eax, ecx
SBB ebx, edx

Multiplying two 64-bit numbers:

esi contains low dword of a, edi the high dword
ebx contains low dword of b, ecx the high dword
x and y are variables

MOV eax, esi
MUL ebx
MOV [x], eax
MOV [y], edx

MOV eax, edi
MUL ebx
ADD [y], eax

MOV eax, esi
MUL ecx
ADD [y], eax

x contains low dword, y high dword

Theory of this: This is all elementary school math. Suppose we want to multiply two two-digit numbers:
59 * 28 = 9 * 8 + 5 * 8 * 10 + 9 * 2 * 10 + 5 * 2 * 100
The last product can be excluded, because it will not affect the final two-digit result.
59 * 28 = 9 * 8 + 5 * 8 * 10 + 9 * 2 * 10
The same principle can be used on 64-bit numbers, one digit becoming one dword.

Dividing a 64-bit number by another 64-bit number:

This is really tricky. You can't do it the same way as in elementary school, because in elementary school math you'll have to handle more than one digit as one number. I couldn't make my own implementation, so I searched for and found some code on the 'net:

;
; _ulldiv divides two unsigned long long numbers, the dividend and
; the divisor resulting in a quotient and a remainder.
;
; input:
;   edx:eax = dividend
;   ecx:ebx = divisor
;
; output:
;   edx:eax = quotient of division of dividend by divisor
;   ecx:ebx = remainder of division of dividend by divisor
;
; destroys:
;   eflags
;

PROC    _ulldiv NEAR

       test    ecx, ecx         ; divisor > 2^32-1 ?
       jnz     $big_divisor     ; yes, divisor > 32^32-1
       cmp     edx, ebx         ; only one division needed ? (ecx = 0)
       jb      $one_div         ; yes, one division sufficient
       mov     ecx, eax         ; save dividend-lo in ecx
       mov     eax, edx         ; get dividend-hi
       xor     edx, edx         ; zero extend it into edx:eax
       div     ebx              ; quotient-hi in eax
       xchg    eax, ecx         ; ecx = quotient-hi, eax =dividend-lo

$one_div:    

       div     ebx              ; eax = quotient-lo
       mov     ebx, edx         ; ebx = remainder-lo
       mov     edx, ecx         ; edx = quotient-hi(quotient in edx:eax)
       xor     ecx, ecx         ; ecx = remainder-hi (rem. in ecx:ebx)
       ret

$big_divisor:

       push    esi              ; save temp
       push    edi              ;  variables
       push    edx              ; save
       push    eax              ;  dividend
       mov     esi, ebx         ; divisor now in
       mov     edi, ecx         ;  edi:ebx and ecx:esi
       shr     edx, 1           ; shift both
       rcr     eax, 1           ;  divisor and
       ror     edi, 1           ;   and dividend
       rcr     ebx, 1           ;    right by 1 bit
       bsr     ecx, ecx         ; ecx = number of remaining shifts
       shrd    ebx, edi, CL     ; scale down divisor and
       shrd    eax, edx, CL     ;   dividend such that divisor
       shr     edx, CL          ;    less than 2^32 (i.e. fits in ebx)
       rol     edi, 1           ; restore original divisor (edi:esi)
       div     ebx              ; compute quotient
       pop     ebx              ; get dividend lo-word
       mov     ecx, eax         ; save quotient
       imul    edi, eax         ; quotient * divisor hi-word (low only)
       mul     esi              ; quotient * divisor lo-word
       add     edx, edi         ; edx:eax = quotient * divisor
       sub     ebx, eax         ; dividend-lo - (quot.*divisor)-lo
       mov     eax, ecx         ; get quotient
       pop     ecx              ; restore dividend hi-word
       sbb     ecx, edx         ; subtract divisor * quot. from dividend
       sbb     edx, edx         ; 0 if remainder > 0, else FFFFFFFFh
       and     esi, edx         ; nothing to add
       and     edi, edx         ;  back if remainder positive
       add     ebx, esi         ; correct remaider
       adc     ecx, edi         ;  and quotient if
       add     eax, edx         ;   necessary
       xor     edx, edx         ; clear hi-word of quot (eax<=FFFFFFFFh)
       pop     edi              ; restore temp
       pop     esi              ;  variables
       ret

ENDP    _ulldiv

;Credit: Norbert Juffa

Anyway, if you just want to divide a 64-bit number by a 32-bit number, use the div or idiv instruction. As you probably know, they divide the qword in EDX:EAX by any other 32-bit value.

Alexander Toresson